In other words, the goals are to:
This is a very important lab session. You can benefit a lot if you work through it step by step at home and read the King, Tomz, and Wittenberg (2000) article again.
But first we have another look at interaction effects.
# The first line sets an option for the final document that can be produced from
# the .Rmd file. Don't worry about it.
knitr::opts_chunk$set(
echo = TRUE,
attr.output = 'style="max-height: 200px;"'
# collapse = TRUE
)
# The next bit is quite powerful and useful.
# First you define which packages you need for your analysis and assign it to
# the p_needed object.
p_needed <-
c("viridis", # we will use magma palette this time
"dplyr", # for preprocessing
"broom", # for tidy model output
"dagitty", # for the DAG in appendix
"ggplot2",
"scales",
"MASS" # to draw from the multivariate normal distribution
)
# Now you check which packages are already installed on your computer.
# The function installed.packages() returns a vector with all the installed
# packages.
packages <- rownames(installed.packages())
# Then you check which of the packages you need are not installed on your
# computer yet. Essentially you compare the vector p_needed with the vector
# packages. The result of this comparison is assigned to p_to_install.
p_to_install <- p_needed[!(p_needed %in% packages)]
# If at least one element is in p_to_install you then install those missing
# packages.
if (length(p_to_install) > 0) {
install.packages(p_to_install)
}
# Now that all packages are installed on the computer, you can load them for
# this project. Additionally the expression returns whether the packages were
# successfully loaded.
sapply(p_needed, require, character.only = TRUE)## Loading required package: viridis
## Loading required package: viridisLite
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: broom
## Loading required package: dagitty
## Loading required package: ggplot2
## Loading required package: scales
##
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
##
## viridis_pal
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## viridis dplyr broom dagitty ggplot2 scales MASS
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
So far, we have only looked at interactions between a continuous predictor and a categorical moderating variable. We can, however, easily generalize this idea to continuous-by-continuous interactions.
The data frame ia_data contains four variables: An
outcome variable y, a continuous predictor x,
and two versions of a moderating variable: The continuous
z_con and the binary/categorical z_cat.
Below, we run two models of the form \(\hat{y} = \beta_1 + \beta_2 x + \beta_3 z + \beta_4 x z\), using the continuous and categorical moderators, respectively.
## Load data
load("raw-data/ia_data.RData")
## Model with continuous moderator
mod_con <- lm(y ~ x * z_con, data = ia_data)
summary(mod_con)##
## Call:
## lm(formula = y ~ x * z_con, data = ia_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.954 -7.148 0.085 7.086 35.538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8324 0.8355 3.390 0.000713 ***
## x 2.6908 0.3768 7.141 1.29e-12 ***
## z_con 0.8038 0.3252 2.472 0.013534 *
## x:z_con 1.0975 0.1469 7.473 1.16e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.29 on 1996 degrees of freedom
## Multiple R-squared: 0.3169, Adjusted R-squared: 0.3158
## F-statistic: 308.6 on 3 and 1996 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = y ~ x * z_cat, data = ia_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.984 -7.207 0.014 7.344 39.591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.1945 0.7625 4.189 2.92e-05 ***
## x 3.6484 0.3423 10.658 < 2e-16 ***
## z_cat 2.3655 1.0494 2.254 0.0243 *
## x:z_cat 2.5030 0.4725 5.298 1.30e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.73 on 1996 degrees of freedom
## Multiple R-squared: 0.2578, Adjusted R-squared: 0.2567
## F-statistic: 231.1 on 3 and 1996 DF, p-value: < 2.2e-16
Next, we want to calculate the linear prediction, \(\hat{y}\), as a function of \(x\) at different values of the moderating
variable \(z\). (Please note that this
does not mean that \(z\) is a mediator
in a causal graph. Here the term moderating variable is used to say that
the effect of \(x\) on \(y\) is influenced by \(z\).) When using z_cat, this
is easy. As before, we calculate values for two lines: One when
z_val_cat = 0 and one when z_val_cat = 1. When
using the continuous moderator z_cont things are a bit more
intricate. We need to select a range of values of z_cont at
which we want to calculate the relationship between \(y\) and \(x\). Here, we use the following values:
## function for the linear prediction
pred_lm <- function (b, x, z) {
preds <- matrix(NA, length(x), length(z))
for (i in seq_len(length(z))) {
preds[, i] <- b[1] + b[2] * x + b[3] * z[i] + b[4] * x * z[i]
}
return(preds)
}
## use the function for the continuous model
b_con <- coef(mod_con)
x_vals <- seq(min(ia_data$x), max(ia_data$x), length.out = 101)
z_vals_con <- quantile(ia_data$z_con,
c(0, .01, .1, .25, .5, .75, .9, .99, 1))
preds_con <- pred_lm(b_con, x_vals, z_vals_con)
## ..and for the categorical model
b_cat <- coef(mod_cat)
z_vals_cat <- c(0, 1)
preds_cat <- pred_lm(b_cat, x_vals, z_vals_cat)Next, we want to calculate the marginal effect of each of the predicted lines. As we know from the lecture (Week 06, slide 27), if we have a regression equation of the form \(y = \beta_1 + \beta_2 x + \beta_3 z + \beta_4 x z + \epsilon\), then the marginal effect of \(x\) can be obtained by taking the partial derivative with respect to \(x\):
\[ \underbrace{\frac{\partial y}{\partial x}}_{\text{Notation for partial derivative}} = \underbrace{\beta_2 + \beta_4 z}_{\text{Marginal effect of x}} \]
Lastly, we also want to calculate the standard errors for our estimated marginal effects of \(x\). Per the lecture slides (Week 06, slide 26), we know the formula is
\[ \hat{\sigma}_{\frac{\partial y}{\partial x}} = \sqrt{\text{var}(\hat{\beta_2}) + z^2 \text{var}(\hat{\beta_4}) + 2 z \text{cov}(\hat{\beta_2}, \hat{\beta_4})} \]
## function for the standard error of the marginal effect
se_mfx <- function (vcov, z) {
sqrt(vcov[2, 2] + z ^ 2 * vcov[4, 4] + 2 * z * vcov[4, 2])
}
## use the function for the continuous model
vcov_con <- vcov(mod_con)
se_mfx_con <- se_mfx(vcov_con, z_vals_con)
## use the function for the continuous model
vcov_cat <- vcov(mod_cat)
se_mfx_cat <- se_mfx(vcov_cat, z_vals_cat)Using our stored objects for
we can now plot the linear predictions and the marginal effects (with confidence intervals) for both scenarios of categorical and continuous moderation side-by-side:
## set up a 2-by-2 plot (and adjust plot margins)
par(mfrow = c(2, 2),
mar = c(5.1, 6.1, 4.1, 2.1))
## Plot 1: Linear Prediction (Categorical)
col_vec <- viridis(length(se_mfx_cat)+1)
plot(
x = ia_data$x,
y = ia_data$y,
pch = 16,
xlab = "x",
ylab = "y",
type = "n",
bty = "n",
las = 1,
main = "Linear Prediction (Categorical)",
bty = "n"
)
for (i in seq_len(ncol(preds_cat))) {
lines(
x = x_vals,
y = preds_cat[, i],
lty = i,
col = col_vec[i]
)
}
## Plot 2: Marginal Effect (Categorical)
col_vec <- viridis(length(se_mfx_cat)+1)
plot (
x = z_vals_cat,
y = mfx_cat,
pch = 16,
xlab = "z",
ylab = "", # added manually later
type = "n",
bty = "n",
las = 1,
main = "Marginal Effect (Categorical)",
xlim = c(-1, 2),
ylim = c(-2, 12),
axes = F,
bty = "n"
)
abline(h = 0, col = 'gray60', lwd = .5)
axis(1,
at = z_vals_cat)
axis(2, las = 1)
# add the label for y-axis separately, horizontally
text(bquote(frac(partialdiff ~ y, partialdiff ~ x)), xpd = TRUE, x = -1.6, y = 6)
for (i in 1:length(se_mfx_cat)) {
points(
x = z_vals_cat[i],
y = mfx_cat[i],
pch = 16,
col = col_vec[i]
)
}
for (i in 1:length(se_mfx_cat)) {
segments(
z_vals_cat[i],
mfx_cat[i] + qnorm(.025) * se_mfx_cat[i],
z_vals_cat[i],
mfx_cat[i] + qnorm(.975) * se_mfx_cat[i],
col = col_vec[i]
)
}
## Plot 3: Linear Prediction (Continuous)
col_vec <- viridis(length(se_mfx_con))
plot (
x = ia_data$x,
y = ia_data$y,
pch = 16,
xlab = "x",
ylab = "y",
type = "n",
bty = "n",
las = 1,
main = "Linear Prediction (Continuous)",
bty = "n"
)
for (i in 1:ncol(preds_con)) {
lines(
x = x_vals,
y = preds_con[, i],
lty = i,
col = col_vec[i]
)
}
## Plot 4: Marginal Effect (Continuous)
col_vec <- viridis(length(se_mfx_con))
plot (
x = z_vals_con,
y = mfx_con,
pch = 16,
xlab = "z",
ylab = "",
type = "n",
bty = "n",
axes = F,
main = "Marginal Effect (Continuous)",
xlim = c(-4, 8),
ylim = c(-3, 13),
bty = "n"
)
axis(1)
axis(2,
las = 1,
at = seq(-2,12, by = 2))
abline(h = 0, col = 'gray60', lwd = .5)
text(bquote(frac(partialdiff ~ y, partialdiff ~ x)), xpd = TRUE, x = -6.2, y = 6)
for (i in 1:length(se_mfx_con)) {
points(
x = z_vals_con[i],
y = mfx_con[i],
pch = 16,
col = col_vec[i]
)
}
for (i in 1:length(se_mfx_con)) {
segments(
z_vals_con[i],
mfx_con[i] + qnorm(.025) * se_mfx_con[i],
z_vals_con[i],
mfx_con[i] + qnorm(.975) * se_mfx_con[i],
col = col_vec[i]
)
}
## Add contiguous lines for mfx and se's to the last plot
## Compute first...
z_vals_fine <-
seq(min(ia_data$z_con), max(ia_data$z_con), length.out = 101)
mfx_fine <- mfx_lm(b_con, z_vals_fine)
se_mfx_fine <- se_mfx(vcov_con, z_vals_fine)
## ... then plot
lines(z_vals_fine, mfx_fine, col = adjustcolor("black", alpha = 0.5))
lines(
z_vals_fine,
mfx_fine + qnorm(.025) * se_mfx_fine,
lty = 2,
col = adjustcolor("black", alpha = 0.5)
)
lines(
z_vals_fine,
mfx_fine + qnorm(.975) * se_mfx_fine,
lty = 2,
col = adjustcolor("black", alpha = 0.5)
)We use the 2013 election data set again.
Similar to the homework, we regress leftvote on
unemployment and east. Additionally, we
include a multiplicative interaction term
unemployment*east.
##
## Call:
## lm(formula = leftvote ~ unemployment + east + unemployment *
## east, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8438 -0.8513 -0.1683 0.5337 11.4562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.26144 0.30817 7.338 2.11e-12 ***
## unemployment 0.54859 0.04739 11.576 < 2e-16 ***
## east 16.10218 1.43807 11.197 < 2e-16 ***
## unemployment:east -0.13650 0.14407 -0.947 0.344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.874 on 295 degrees of freedom
## Multiple R-squared: 0.9295, Adjusted R-squared: 0.9288
## F-statistic: 1297 on 3 and 295 DF, p-value: < 2.2e-16
Review: How did we make predictions so far?
# 1. Get the coefficients.
intercept <- coef(reg)[1]
slopes <- coef(reg)[2:4]
# 2. Choose interesting covariates' values.
x_east0 <- 0
x_east1 <- 1
x_unemp <- seq(0, 20, 0.1)
# 3.1. Write your predict function.
predict_mu <- function(intercept, slopes, x1, x2) {
intercept + slopes[1] * x1 + slopes[2] * x2 + slopes[3] * x1 * x2
}
#3.2. Let the function do the work for you.
pred_leftvote_west <-
predict_mu(intercept, slopes, x1 = x_unemp, x2 = x_east0)
pred_leftvote_east <-
predict_mu(intercept, slopes, x1 = x_unemp, x2 = x_east1)# 4. Plot it.
# 4.1. Plot the observations.
plot(
x_unemp,
pred_leftvote_west,
type = "n",
bty = "n",
las = 1,
lwd = 2,
ylim = c(0, 30),
ylab = "Predicted Voteshare (Die Linke) in %",
xlab = "Unemployment in %",
main = "Left Voteshare and Unemployment"
)
points(df$unemployment,
df$leftvote ,
pch = 19,
col = ifelse(df$east == 1,
viridis(2, alpha = 0.3, end = 0.5)[1],
viridis(2, alpha = 0.3, end = 0.5)[2])
)
# 4.2. Add the lines on top.
lines(
x = x_unemp,
y = pred_leftvote_west,
lwd = 2,
col = viridis(2, end = 0.5)[2]
)
lines(
x = x_unemp,
y = pred_leftvote_east,
lwd = 2,
col = viridis(2, end = 0.5)[1]
)ggplot(
data = df,
aes(x = unemployment, y = leftvote, group = as.character(east))
) +
geom_point(
aes(color = east, alpha = 0.5)
) +
geom_line(mapping = aes(y = predict(reg),
color = east)) +
theme_classic() +
labs(
x = "Unemployment in %",
y = "Predicted Voteshare (Die Linke) in %",
color = "",
title = "Left Voteshare and Unemployment"
) +
theme(legend.position = "none")Before we start with our simulations - meet the apply
function.
Description: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”
## leftvote unemployment east
## 1 5.70 8.6 0
## 2 4.40 7.5 0
## 3 5.00 6.2 0
## 4 4.40 5.4 0
## 5 6.90 9.3 0
## 6 4.80 7.5 0
## 7 5.00 5.2 0
## 8 5.00 4.6 0
## 9 4.40 7.0 0
## 10 4.90 5.2 0
## 11 6.50 9.3 0
## 12 20.40 9.9 1
## 13 21.30 9.5 1
## 14 23.70 11.1 1
## 15 20.60 14.9 1
## 16 21.60 14.0 1
## 17 21.50 12.1 1
## 18 11.00 7.1 0
## 19 10.70 7.1 0
## 20 8.60 7.1 0
## 21 6.50 7.1 0
## 22 7.60 7.1 0
## 23 8.40 7.1 0
## 24 4.96 8.6 0
## 25 4.10 5.1 0
## 26 5.10 8.9 0
## 27 6.50 6.4 0
## 28 5.50 6.8 0
## 29 4.60 6.2 0
## 30 4.20 5.5 0
## 31 3.00 4.1 0
## 32 2.80 4.6 0
## 33 4.50 4.7 0
## 34 5.60 4.9 0
## 35 4.60 5.9 0
## 36 4.40 4.7 0
## 37 6.90 7.1 0
## 38 4.00 3.7 0
## 39 5.10 6.2 0
## 40 4.30 6.5 0
## 41 6.20 8.0 0
## 42 8.00 8.0 0
## 43 4.50 8.0 0
## 44 4.50 7.3 0
## 45 4.70 5.2 0
## 46 5.30 7.9 0
## 47 4.80 8.0 0
## 48 4.98 7.6 0
## 49 5.60 7.6 0
## 50 6.70 7.1 0
## 51 4.90 5.8 0
## 52 5.10 7.9 0
## 53 6.30 6.2 0
## 54 10.10 10.2 0
## 55 10.00 11.6 0
## 56 22.50 11.0 1
## 57 23.80 13.8 1
## 58 18.40 8.7 1
## 59 26.30 9.8 1
## 60 22.90 9.4 1
## 61 20.70 7.1 1
## 62 21.80 7.6 1
## 63 24.70 9.9 1
## 64 22.60 11.0 1
## 65 21.70 13.3 1
## 66 24.60 12.2 1
## 67 21.70 9.1 1
## 68 22.90 9.7 1
## 69 24.00 11.3 1
## 70 22.80 11.1 1
## 71 25.60 11.9 1
## 72 24.40 11.4 1
## 73 23.70 12.1 1
## 74 25.70 12.0 1
## 75 18.70 11.6 0
## 76 25.20 11.6 1
## 77 8.00 11.6 0
## 78 9.50 11.6 0
## 79 7.20 11.6 0
## 80 8.90 11.6 0
## 81 10.30 11.6 0
## 82 14.30 11.6 1
## 83 25.10 11.6 1
## 84 29.50 11.6 1
## 85 32.90 11.6 1
## 86 34.60 11.6 1
## 87 7.70 8.4 0
## 88 6.50 8.4 0
## 89 5.20 7.0 0
## 90 5.70 7.8 0
## 91 5.20 7.4 0
## 92 5.20 6.6 0
## 93 8.10 9.0 0
## 94 6.90 9.0 0
## 95 9.20 9.0 0
## 96 6.40 6.7 0
## 97 5.60 5.5 0
## 98 4.60 5.5 0
## 99 5.40 6.0 0
## 100 5.10 6.2 0
## 101 7.20 8.4 0
## 102 8.70 11.9 0
## 103 6.80 9.4 0
## 104 5.10 6.8 0
## 105 5.50 6.8 0
## 106 6.40 8.4 0
## 107 7.80 8.4 0
## 108 4.97 5.9 0
## 109 6.40 10.6 0
## 110 5.20 7.8 0
## 111 5.40 7.0 0
## 112 4.60 6.2 0
## 113 6.00 6.7 0
## 114 6.60 8.4 0
## 115 7.90 12.3 0
## 116 8.80 12.3 0
## 117 8.00 10.5 0
## 118 6.40 9.2 0
## 119 8.10 12.3 0
## 120 6.60 12.3 0
## 121 6.80 10.7 0
## 122 6.30 10.7 0
## 123 7.60 13.5 0
## 124 4.40 4.4 0
## 125 6.20 9.8 0
## 126 3.60 4.1 0
## 127 4.20 3.2 0
## 128 4.70 4.5 0
## 129 6.30 5.9 0
## 130 4.40 5.7 0
## 131 4.98 4.9 0
## 132 8.40 8.9 0
## 133 5.80 5.9 0
## 134 4.90 5.6 0
## 135 5.10 7.6 0
## 136 5.10 6.1 0
## 137 5.20 6.0 0
## 138 6.70 9.1 0
## 139 6.40 7.2 0
## 140 7.90 9.6 0
## 141 8.00 11.8 0
## 142 7.90 12.6 0
## 143 7.80 12.6 0
## 144 6.20 9.0 0
## 145 6.30 9.9 0
## 146 5.20 6.1 0
## 147 4.70 5.1 0
## 148 5.80 5.4 0
## 149 4.80 5.7 0
## 150 5.90 6.9 0
## 151 20.60 10.3 1
## 152 21.30 10.8 1
## 153 22.60 10.8 1
## 154 19.90 9.2 1
## 155 18.70 9.0 1
## 156 19.90 9.4 1
## 157 19.20 12.4 1
## 158 17.10 8.8 1
## 159 19.00 8.8 1
## 160 18.10 8.9 1
## 161 20.60 8.6 1
## 162 23.00 10.1 1
## 163 20.00 8.6 1
## 164 19.40 8.8 1
## 165 21.30 8.3 1
## 166 20.20 8.4 1
## 167 5.50 4.8 0
## 168 8.70 8.1 0
## 169 6.00 5.3 0
## 170 5.40 4.8 0
## 171 6.80 4.4 0
## 172 5.20 5.9 0
## 173 6.50 6.3 0
## 174 4.60 3.6 0
## 175 5.50 4.9 0
## 176 4.60 4.3 0
## 177 5.20 4.9 0
## 178 4.50 4.7 0
## 179 5.90 7.2 0
## 180 6.10 4.9 0
## 181 4.20 4.2 0
## 182 8.90 7.2 0
## 183 8.10 7.2 0
## 184 6.20 5.8 0
## 185 6.60 6.9 0
## 186 6.70 5.2 0
## 187 5.50 5.1 0
## 188 4.90 4.6 0
## 189 19.90 8.6 1
## 190 22.50 7.7 1
## 191 24.50 9.5 1
## 192 22.10 8.0 1
## 193 23.00 8.8 1
## 194 25.60 8.4 1
## 195 23.00 10.1 1
## 196 24.80 6.9 1
## 197 25.00 6.2 1
## 198 5.20 5.5 0
## 199 4.70 4.4 0
## 200 5.20 5.2 0
## 201 4.80 4.2 0
## 202 5.80 6.6 0
## 203 4.40 3.7 0
## 204 6.30 4.2 0
## 205 5.10 3.7 0
## 206 5.50 5.0 0
## 207 5.10 5.4 0
## 208 5.60 6.5 0
## 209 4.80 4.5 0
## 210 7.60 7.0 0
## 211 6.50 6.5 0
## 212 4.80 4.2 0
## 213 2.90 3.7 0
## 214 2.90 2.0 0
## 215 3.30 2.2 0
## 216 3.00 2.7 0
## 217 3.60 2.2 0
## 218 4.50 4.9 0
## 219 4.20 4.9 0
## 220 4.80 4.9 0
## 221 4.80 4.9 0
## 222 2.80 3.0 0
## 223 2.90 3.1 0
## 224 2.60 2.8 0
## 225 3.00 3.4 0
## 226 3.10 3.1 0
## 227 3.30 3.4 0
## 228 3.00 3.0 0
## 229 3.80 4.3 0
## 230 2.70 2.9 0
## 231 3.00 3.9 0
## 232 3.40 3.1 0
## 233 3.90 3.1 0
## 234 3.20 3.1 0
## 235 3.40 4.9 0
## 236 4.10 3.6 0
## 237 3.30 4.3 0
## 238 4.00 4.4 0
## 239 4.80 5.0 0
## 240 3.60 3.7 0
## 241 3.90 3.0 0
## 242 4.40 3.1 0
## 243 4.98 4.1 0
## 244 7.10 7.6 0
## 245 5.80 7.1 0
## 246 3.80 2.8 0
## 247 3.60 3.7 0
## 248 4.50 3.6 0
## 249 3.40 2.9 0
## 250 4.96 3.5 0
## 251 4.00 3.5 0
## 252 5.60 5.6 0
## 253 3.20 2.5 0
## 254 3.30 2.2 0
## 255 3.70 3.1 0
## 256 3.60 3.7 0
## 257 3.20 3.1 0
## 258 6.20 5.9 0
## 259 6.70 5.9 0
## 260 4.40 3.5 0
## 261 4.60 3.4 0
## 262 4.10 3.4 0
## 263 4.20 4.1 0
## 264 4.50 3.8 0
## 265 4.70 3.8 0
## 266 4.10 3.7 0
## 267 4.70 4.5 0
## 268 4.80 3.2 0
## 269 4.20 3.6 0
## 270 4.70 4.2 0
## 271 6.00 5.4 0
## 272 4.10 3.1 0
## 273 4.10 3.7 0
## 274 5.70 4.5 0
## 275 7.50 5.9 0
## 276 4.20 3.5 0
## 277 4.80 4.1 0
## 278 4.40 3.5 0
## 279 4.60 4.3 0
## 280 4.20 3.9 0
## 281 7.90 5.1 0
## 282 4.80 3.5 0
## 283 4.60 3.3 0
## 284 4.70 3.7 0
## 285 4.00 3.1 0
## 286 4.10 3.8 0
## 287 5.00 4.3 0
## 288 4.30 3.1 0
## 289 4.80 3.8 0
## 290 6.60 3.4 0
## 291 4.30 3.8 0
## 292 3.40 2.7 0
## 293 4.40 3.2 0
## 294 4.30 2.8 0
## 295 4.10 4.0 0
## 296 11.70 9.0 0
## 297 9.00 5.6 0
## 298 9.00 5.8 0
## 299 10.20 6.7 0
What are margins?
Type 1 for rows, 2 for columns, (1:2) for both
General usage: apply(X, MARGIN, FUN, ...)
Let’s start with column means.
## leftvote unemployment east
## 8.730870 6.768896 0.187291
Use apply to plot…
## $leftvote
## $breaks
## [1] 0 5 10 15 20 25 30 35
##
## $counts
## [1] 120 116 7 12 35 7 2
##
## $density
## [1] 0.080267559 0.077591973 0.004682274 0.008026756 0.023411371 0.004682274
## [7] 0.001337793
##
## $mids
## [1] 2.5 7.5 12.5 17.5 22.5 27.5 32.5
##
## $xname
## [1] "newX[, i]"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $unemployment
## $breaks
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15
##
## $counts
## [1] 15 48 43 36 29 33 29 18 12 21 10 4 1
##
## $density
## [1] 0.050167224 0.160535117 0.143812709 0.120401338 0.096989967 0.110367893
## [7] 0.096989967 0.060200669 0.040133779 0.070234114 0.033444816 0.013377926
## [13] 0.003344482
##
## $mids
## [1] 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5
##
## $xname
## [1] "newX[, i]"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $east
## $breaks
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
##
## $counts
## [1] 243 0 0 0 0 0 0 0 0 56
##
## $density
## [1] 8.12709 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## [10] 1.87291
##
## $mids
## [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
##
## $xname
## [1] "newX[, i]"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
## [1] 14.30 11.90 11.20 9.80 16.20 12.30 10.20 9.60 11.40 10.10 15.80 31.30
## [13] 31.80 35.80 36.50 36.60 34.60 18.10 17.80 15.70 13.60 14.70 15.50 13.56
## [25] 9.20 14.00 12.90 12.30 10.80 9.70 7.10 7.40 9.20 10.50 10.50 9.10
## [37] 14.00 7.70 11.30 10.80 14.20 16.00 12.50 11.80 9.90 13.20 12.80 12.58
## [49] 13.20 13.80 10.70 13.00 12.50 20.30 21.60 34.50 38.60 28.10 37.10 33.30
## [61] 28.80 30.40 35.60 34.60 36.00 37.80 31.80 33.60 36.30 34.90 38.50 36.80
## [73] 36.80 38.70 30.30 37.80 19.60 21.10 18.80 20.50 21.90 26.90 37.70 42.10
## [85] 45.50 47.20 16.10 14.90 12.20 13.50 12.60 11.80 17.10 15.90 18.20 13.10
## [97] 11.10 10.10 11.40 11.30 15.60 20.60 16.20 11.90 12.30 14.80 16.20 10.87
## [109] 17.00 13.00 12.40 10.80 12.70 15.00 20.20 21.10 18.50 15.60 20.40 18.90
## [121] 17.50 17.00 21.10 8.80 16.00 7.70 7.40 9.20 12.20 10.10 9.88 17.30
## [133] 11.70 10.50 12.70 11.20 11.20 15.80 13.60 17.50 19.80 20.50 20.40 15.20
## [145] 16.20 11.30 9.80 11.20 10.50 12.80 31.90 33.10 34.40 30.10 28.70 30.30
## [157] 32.60 26.90 28.80 28.00 30.20 34.10 29.60 29.20 30.60 29.60 10.30 16.80
## [169] 11.30 10.20 11.20 11.10 12.80 8.20 10.40 8.90 10.10 9.20 13.10 11.00
## [181] 8.40 16.10 15.30 12.00 13.50 11.90 10.60 9.50 29.50 31.20 35.00 31.10
## [193] 32.80 35.00 34.10 32.70 32.20 10.70 9.10 10.40 9.00 12.40 8.10 10.50
## [205] 8.80 10.50 10.50 12.10 9.30 14.60 13.00 9.00 6.60 4.90 5.50 5.70
## [217] 5.80 9.40 9.10 9.70 9.70 5.80 6.00 5.40 6.40 6.20 6.70 6.00
## [229] 8.10 5.60 6.90 6.50 7.00 6.30 8.30 7.70 7.60 8.40 9.80 7.30
## [241] 6.90 7.50 9.08 14.70 12.90 6.60 7.30 8.10 6.30 8.46 7.50 11.20
## [253] 5.70 5.50 6.80 7.30 6.30 12.10 12.60 7.90 8.00 7.50 8.30 8.30
## [265] 8.50 7.80 9.20 8.00 7.80 8.90 11.40 7.20 7.80 10.20 13.40 7.70
## [277] 8.90 7.90 8.90 8.10 13.00 8.30 7.90 8.40 7.10 7.90 9.30 7.40
## [289] 8.60 10.00 8.10 6.10 7.60 7.10 8.10 20.70 14.60 14.80 16.90
# Combining functions in a function.
multi_fun <- function(x) {
c(min = min(x),
mean = mean(x),
max = max(x))
}
# And then use the multi_fun with apply.
apply(vars, 2, multi_fun)## leftvote unemployment east
## min 2.60000 2.000000 0.000000
## mean 8.73087 6.768896 0.187291
## max 34.60000 14.900000 1.000000
We want to quickly calculate the 2.5%, 50% and 97.5% quantiles of
the variables in the data set using the apply function. Just as with the
plots above, we can specify additional arguments to the function call in
the apply function. Here:
probs = c(0.025, 0.5, 0.975).
Now combine some functions to get the 2.5 % and 97.5 % quantiles as well as the mean. We call this function quants_mean_fun
## leftvote unemployment east
## 2.5% 3.000 2.800 0
## 50% 5.600 6.200 0
## 97.5% 25.155 12.355 1
# Now combine some functions to get the 2.5 % and 97.5 % quantiles
# as well as the mean.
quants_mean_fun <- function(x) {
c(quants = quantile(x, probs = c(0.025, 0.975)),
mean = mean(x))
}
quants_mean <- apply(vars, 2, quants_mean_fun)
quants_mean## leftvote unemployment east
## quants.2.5% 3.00000 2.800000 0.000000
## quants.97.5% 25.15500 12.355000 1.000000
## mean 8.73087 6.768896 0.187291
Become familiar with apply and your R code will be short and fast!
Now we get started with simulations. I hope you all read the King, Tomz, and Wittenberg (2000) article. And you remember the five steps of simulation from the lecture.
Our first goal is to get so-called expected values, \(E(Y|X)\). Expected values are the average (expected) value of a variable \(Y\), conditional on a particular set of \(X\) values. For example, we could be interested in the expected vote share of the Party Die Linke in a West German district with an unemployment rate of \(6.7\%\) (this amounts to the nationwide average unemployment rate). In mathematical terms, this would be \(E(\text{Leftvote} | \text{West}, \text{Unempl.} = 0.067)\).
Let’s do this:
## (Intercept) unemployment east unemployment:east
## 0.30817130 0.04739161 1.43806978 0.14407205
E.g., difference in voteshare Die Linke for East and West.
Tip: double-check the ordering of coefficients first, to put the values in the correct order.
## [1] "(Intercept)" "unemployment" "east"
## [4] "unemployment:east"
A first difference is the difference between two expected values:
\[ \underbrace{FD}_{\text{First Difference}} = \underbrace{E(Y | X_{1})}_{\text{Expected Value of first scenario}} - \underbrace{E(Y | X_{2})}_{\text{Expected Value of second scenario}} \]
Plot the expected values for west.
par(mfrow = c(1,2))
hist(
EV_combined[, 2],
las = 1,
col = viridis(4)[1],
border = "white",
main = "",
xlab = "Expected Values for the voteshare of Die Linke for districts in the west
(With unemployment at its mean.)",
)
# Get mean and quantiles. We use our quants_mean_fun from above.
quants_combined <- apply(EV_combined, 2, quants_mean_fun)
# Add the lines to the plot.
abline(v = c(quants_combined[, 2]),
lty = 2,
col = viridis(4)[4])
# Of course we can do the same for east.
hist(
EV_combined[, 1],
main = "",
las = 1,
col = viridis(4)[2],
border = "white",
xlab = "Expected Values for the voteshare of Die Linke for East
(With unemployment at its mean.)",
)
abline(v = c(quants_combined[, 1]),
lty = 2,
col = viridis(4)[4])## null device
## 1
hist(
fd,
main = "",
las = 1,
col = viridis(4)[3],
border = "white",
xlab = "First Differences for the voteshare of Die Linke between East and West
(With unemployment at its mean.)"
)
# Get mean amd quantiles.
quants_fd <- apply(as.matrix(fd), 2, quants_mean_fun)
# Add the lines to the plot
abline(v = quants_fd, lty = 2, col = viridis(4)[4])# Expected Values, West
ggplot() +
geom_histogram(aes(x = EV_combined[,2]),
boundary = 5.4,
binwidth = 0.1,
color = "white",
fill = viridis(4)[1]
) +
labs(
x = "Expected Values for the voteshare of Die Linke for districts in the west
(With unemployment at its mean.)",
y = "Frequency"
) +
geom_vline(xintercept = c(quants_combined[, 2]),
color = viridis(4)[4],
linetype = "dashed") +
theme_classic() +
scale_x_continuous(breaks = c(seq(5.4, 6.4, by = 0.2)))# Expected values, East
ggplot() +
geom_histogram(aes(x = EV_combined[, 1]),
boundary = 19,
binwidth = 0.5,
color = "white",
fill = viridis(4)[2]
) +
labs(
x = "Expected Values for the voteshare of Die Linke for districts in the east
(With unemployment at its mean.)",
y = "Frequency"
) +
geom_vline(xintercept = c(quants_combined[, 1]),
color = viridis(4)[4],
linetype = "dashed") +
theme_classic() +
scale_x_continuous(breaks = c(seq(19, 23, by = 1)))# First Difference
ggplot() +
geom_histogram(aes(x = fd),
boundary = 19,
binwidth = 0.5,
color = "white",
fill = viridis(4)[3]
) +
labs(
x = "EFirst Differences for the voteshare of Die Linke between East and West
(With unemployment at its mean.)",
y = "Frequency"
) +
geom_vline(xintercept = c(quants_fd),
color = viridis(4)[4],
linetype = "dashed") +
theme_classic() +
scale_x_continuous(breaks = c(seq(13, 27, by = 1)))So far the scenarios were rather boring. Let’s do something more exciting! We calculate expected values over a range of unemployment.
We go back to Step 3.
This time we choose a range of covariate values.
## [1] 1000 201
# Plot
plot(
unemp,
quants_range_east[2,],
pch = 19,
cex = 0.3,
bty = "n",
las = 1,
ylim = c(0, 35),
ylab = "Voteshare (%)",
main = "Expected Voteshare (Die Linke)",
xlab = "Range of Unemployment",
type = "n"
)
# Let's add our actual observations.
points(df$unemployment,
df$leftvote,
pch = 19,
col = adjustcolor(viridis(3)[1], alpha = 0.5))
# Now we add the lines.
lines(unemp, quants_range_east[3,],
col = viridis(3)[2])
lines(unemp, quants_range_west[3,],
col = viridis(3)[3])
# Let's add those confidence intervals.
# First, for east:
lines(unemp, quants_range_east[1,], lty = "dashed", col = viridis(3)[2])
lines(unemp, quants_range_east[2,], lty = "dashed", col = viridis(3)[2])
# And for west:
lines(unemp, quants_range_west[1,], lty = "dashed", col = viridis(3)[3])
lines(unemp, quants_range_west[2,], lty = "dashed", col = viridis(3)[3])
# Add a legend
legend("topleft",
lty = "solid",
col = viridis(3)[2:3],
legend = c("East", "West"),
cex = 0.8,
bty = "n")There are many different ways to plot confidence intervals. Pick the style you like most. E.g., we can use polygons to plot the confidence intervals:
plot(
unemp,
quants_range_east[2,],
pch = 19,
cex = 0.3,
bty = "n",
las = 1,
ylim = c(0, 35),
ylab = "Voteshare (%)",
main = "Expected Voteshare (Die Linke)",
xlab = "Range of Unemployment",
type = "n"
)
points(df$unemployment,
df$leftvote,
pch = 19,
col = adjustcolor(viridis(3)[1], alpha = 0.8))
polygon(
x = c(unemp, rev(unemp)),
y = c(quants_range_east[1 ,], rev(quants_range_east[2 ,])),
col = adjustcolor(viridis(3)[2], alpha = 0.5),
border = NA
)
polygon(
x = c(unemp, rev(unemp)),
y = c(quants_range_west[1 ,], rev(quants_range_west[2 ,])),
col = adjustcolor(viridis(3)[3], alpha = 0.5),
border = NA
)
lines(unemp, quants_range_east[3,], col = viridis(3)[2]) # In this case I usually plot the polygons first and then the lines.
lines(unemp, quants_range_west[3,],
col = viridis(3)[3])
# Add a legend
legend("topleft",
lty = "solid",
col = viridis(3)[2:3],
legend = c("East", "West"),
cex = 0.8,
bty = "n")# data frame for EV east
plot_ev_west <- data.frame(t(quants_range_west))
names(plot_ev_west) <- c("ci_lo", "ci_hi", "mean")
plot_ev_west$unemp <- unemp
plot_ev_west$east <- 0
# data frame for EV east
plot_ev_east <- data.frame(t(quants_range_east))
names(plot_ev_east) <- c("ci_lo", "ci_hi", "mean")
plot_ev_east$unemp <- unemp
plot_ev_east$east <- 1
# combine data frames
plot_ev <- rbind(plot_ev_west, plot_ev_east)
# plot
ggplot(
data = df,
aes(x = unemployment, y = leftvote,
group = east)
) +
geom_point(
color = viridis(2, 0.5)[1]
) +
# add mean expected values
geom_line(data = plot_ev, aes(x = unemp,
y = mean,
group = east)) +
# add confidence intervals
geom_line(data = plot_ev, aes(x = unemp,
y = ci_lo,
group = east),
linetype = "dashed") +
geom_line(data = plot_ev, aes(x = unemp,
y = ci_hi,
group = east),
linetype = "dashed") +
theme_classic() +
labs(
x = "Unemployment in %",
y = "Predicted Voteshare (Die Linke) in %",
color = "",
title = "Left Voteshare and Unemployment"
) +
theme(legend.position = "none")Today’s exercise session is not about the generation of new code, but about better understanding what we did so far. Together with your neighbour, try to answer the following questions by going through and playing around with the code that we have written in the previous lines.
mvrnorm do?S? Why?scenario_east)Now we also want to simulate predicted values. What’s different from expected values?
X_east <- c(1, mean(df$unemployment), 1, mean(df$unemployment) * 1) # East
X_west <- c(1, mean(df$unemployment), 0, mean(df$unemployment) * 0) # West
X <- as.matrix(rbind(X_east, X_west))This is still the same as above.
Now we need to add something. Remember sigma_est
(i.e. \(\hat\sigma\)) from last
lab/lecture? That’s the fundamental uncertainty!
# Y ~ N(mu_c, sigma_est)
sigma_est <- sqrt(sum(residuals(reg)^2) / (nrow(df) - length(beta_hat)))
Y_hat_east <- EV_combined[,1] + rnorm(nsim, 0, sigma_est)
Y_hat_west <- EV_combined[,2] + rnorm(nsim, 0, sigma_est)
# Quantiles
quants_east <- quants_mean_fun(Y_hat_east)
quants_west <- quants_mean_fun(Y_hat_west)Let’s plot it:
# Histogram Predicted Values West Germany
hist(Y_hat_west,
las = 1,
main = "Histogram of Predicted Values (West Germany)",
col = viridis(3)[1],
border = "white")
abline(v = c(quants_west[1:3]), lty = 2, col = viridis(3)[3])# Histogram Predicted Values East Germany
hist(Y_hat_east,
las = 1,
main = "Histogram of Predicted Values (East Germany)",
col = viridis(3)[2],
border = "white")
abline(v = c(quants_east[1:3]), lty = 2, col = viridis(3)[3])# We could put both distributions in one plot.
plot(density(Y_hat_west),
xlim = c(0,40),
lwd = 2 ,
bty = "n",
yaxt = "n",
ylab = "",
xlab = "Left Voteshare in %",
main = "Predicted Values for Voteshare",
type = "n")
lines(density(Y_hat_west, from = min(Y_hat_west), to = max(Y_hat_west)), lwd = 2, col = viridis(3)[1])
lines(density(Y_hat_east, from = min(Y_hat_east), to = max(Y_hat_east)), lwd = 2, col = viridis(3)[2])
abline(v = c(quants_west[1:3]), lty = 2, col = viridis(3)[3])
abline(v = c(quants_east[1:3]), lty = 2, col = viridis(3)[3])Let’s also do it for a range of unemployment.
EV_range_east <- S %*% t(scenario_east)
EV_range_west <- S %*% t(scenario_west)
Y_hat_range_east <- EV_range_east + rnorm(nsim, 0, sigma_est)
Y_hat_range_west <- EV_range_west + rnorm(nsim, 0, sigma_est)
# Quantiles, we again use apply and our quants_mean_fun
Y_quants_range_east <- apply(Y_hat_range_east, 2, quants_mean_fun)
Y_quants_range_west <- apply(Y_hat_range_west, 2, quants_mean_fun)Plot it with polygons as confidence intervals.
plot(
unemp,
Y_quants_range_east[2, ],
las = 1,
bty = "n",
pch = 19,
cex = 0.3,
ylim = c(0, 45),
ylab = "Voteshare (%)",
main = "Predicted Voteshare (Die Linke)",
xlab = "Range of Unemployment",
type = "n"
)
points(df$unemployment,
df$leftvote ,
pch = 19,
col = adjustcolor(viridis(3)[1], alpha = 0.8))
polygon(
x = c(unemp, rev(unemp)),
y = c(Y_quants_range_east[1 , ], rev(Y_quants_range_east[2 , ])),
col = adjustcolor(viridis(3)[2], alpha = 0.5),
border = NA
)
polygon(
x = c(unemp, rev(unemp)),
y = c(Y_quants_range_west[1 , ], rev(Y_quants_range_west[2 , ])),
col = adjustcolor(viridis(3)[3], alpha = 0.5),
border = NA
)
lines(unemp, Y_quants_range_east[3, ],
col = viridis(3)[2])
lines(unemp, Y_quants_range_west[3, ],
col = viridis(3)[3])
# Add a legend
legend("topleft",
lty = "solid",
col = viridis(3)[2:3],
legend = c("East", "West"),
cex = 0.8,
bty = "n")# data frame for EV east
plot_pv_west <- data.frame(t(Y_quants_range_west))
names(plot_pv_west) <- c("ci_lo", "ci_hi", "mean")
plot_pv_west$unemp <- unemp
plot_pv_west$east <- 0
# data frame for EV east
plot_pv_east <- data.frame(t(Y_quants_range_east))
names(plot_pv_east) <- c("ci_lo", "ci_hi", "mean")
plot_pv_east$unemp <- unemp
plot_pv_east$east <- 1
# combine data frames
plot_pv <- rbind(plot_pv_west, plot_pv_east)
# plot
ggplot(
data = df,
aes(x = unemployment, y = leftvote,
group = east)
) +
geom_point(
color = viridis(2, 0.5)[1]
) +
# add mean expected values
geom_line(data = plot_pv, aes(x = unemp,
y = mean,
group = east)) +
# add confidence intervals
geom_line(data = plot_pv, aes(x = unemp,
y = ci_lo,
group = east),
linetype = "dashed") +
geom_line(data = plot_pv, aes(x = unemp,
y = ci_hi,
group = east),
linetype = "dashed") +
theme_classic() +
labs(
x = "Unemployment in %",
y = "Predicted Voteshare (Die Linke) in %",
color = "",
title = "Left Voteshare and Unemployment"
) +
theme(legend.position = "none")The confidence bounds are wider because we take the fundamental uncertainty of our model into account. To see this we can plot the expected values plot and predicted values plot side by side.
par(mfrow=c(1,2))
# Plot "Expected Voteshares" of Die Linke
plot(
unemp,
quants_range_east[2,],
pch = 19,
cex = 0.3,
bty = "n",
las = 1,
ylim = c(0, 45),
ylab = "Voteshare (%)",
main = "Expected Voteshare (Die Linke)",
xlab = "Range of Unemployment",
type = "n"
)
points(df$unemployment,
df$leftvote,
pch = 19,
col = adjustcolor(viridis(3)[1], alpha = 0.8))
polygon(
x = c(unemp, rev(unemp)),
y = c(quants_range_east[1 ,], rev(quants_range_east[2 ,])),
col = adjustcolor(viridis(3)[2], alpha = 0.5),
border = NA
)
polygon(
x = c(unemp, rev(unemp)),
y = c(quants_range_west[1 ,], rev(quants_range_west[2 ,])),
col = adjustcolor(viridis(3)[3], alpha = 0.5),
border = NA
)
lines(unemp, quants_range_east[3,], col = viridis(3)[2]) # In this case I usually plot the polygons first and then the lines.
lines(unemp, quants_range_west[3,],
col = viridis(3)[3])
# Add a legend
legend("topleft",
lty = "solid",
col = viridis(3)[2:3],
legend = c("East", "West"),
cex = 0.8,
bty = "n")
# Plot "Predicted Voteshares" of Die Linke
plot(
unemp,
Y_quants_range_east[2, ],
las = 1,
bty = "n",
pch = 19,
cex = 0.3,
ylim = c(0, 45),
ylab = "Voteshare (%)",
main = "Predicted Voteshare (Die Linke)",
xlab = "Range of Unemployment",
type = "n"
)
points(df$unemployment,
df$leftvote ,
pch = 19,
col = adjustcolor(viridis(3)[1], alpha = 0.8))
polygon(
x = c(unemp, rev(unemp)),
y = c(Y_quants_range_east[1 , ], rev(Y_quants_range_east[2 , ])),
col = adjustcolor(viridis(3)[2], alpha = 0.5),
border = NA
)
polygon(
x = c(unemp, rev(unemp)),
y = c(Y_quants_range_west[1 , ], rev(Y_quants_range_west[2 , ])),
col = adjustcolor(viridis(3)[3], alpha = 0.5),
border = NA
)
lines(unemp, Y_quants_range_east[3, ],
col = viridis(3)[2])
lines(unemp, Y_quants_range_west[3, ],
col = viridis(3)[3])In your homework you will have a lot of fun with simulations.